Chapter 1 - Information Representation

1.1 Data Representation

Number Systems →

Denary is a base-10 system. 10 symbols/digits - 0-9
Binary is a base-1 system, with 2 symbols (0 & 1). Binary digits are referred to as bits, and all data is manipulated and stored in a computer using binary code
Hexadecimal (hex) is a base-16 system, with 16 symbols (0-9 & A-F). Hex is used for RGB colour codes, IPv6, MAC addresses, error codes

Hex Codes →

A = 10
B = 11
C = 12
D = 13
E = 14
F = 15

Binary Prefixes →

1 bit = 1 0/1 binary digit
1 nibble = 4 bits
1 byte = 8 bits
1 kibibyte (KiB) = 1024 bytes
1 mebibyte (MiB) = 1024 KiB
1 gibibyte (GiB) = 1024 MiB
1 tebibyte (TiB) = 1024 GiB

Denary prefixes →

Kilo = 10^3
Mega = 10^6
Giga = 10^9
Tera = 10^12

Binary Addition & Subtraction →

Addition →

0 + 0 = 0
1 + 0 = 1
1 + 1 = 10
1 + 1 + 1 = 11

Sometimes, an error called overflow can occur - the answer cannot be represented with the current number of bits, as the number of bits in the answer exceeds a predefined range

Subtraction →

Convert the second number (the subtracting one) into two’s complement, and add the two numbers. (see later in the chapter for two’s complement)

Internal Coding of Numbers →

Byte: 8 bits treated as a single units. values are 0 to 28-1
Unsigned integer: simply a binary number; always positive
Signed integer: can be positive or negative. The left-most bit is called the most significant bit (MSB) and this is used to determine if a number is positive or negative ( 1 = negative, 0 = positive)

Sign and Magnitude →

MSB is used for the sign (positive or negative) and the rest is the value
Range: - 127 to 127
Decreased range through double zeroes (positive and negative zero)
Overflow & calculation errors occur

One’s Complement →

1s inverted to 0 and 0s inverted to 1
Range: -127 to 127
Positive and negative zeroes, and calculation errors occur

Two’s Complement →

One’s complement + 1
Copy zeroes from the right end and the first 1, and invert the rest
Increased range : -128 to 127
Only one zero
Easier calculations, with reduced errors

Binary Coded Decimal →

Uses four bits/one nibble to represent each denary digit

Pros →

Easier to convert from denary to BCD, so this makes it easier to encode and decode
Easier to understand and implement in hardware
Can represent large numbers or monetary values accurately
Bits 1010 - 1111 can be used for other characters

Cons →

There is no standard for bits 1010 - 1111
Less efficient
Complicates calculations

Uses →

Financial institutions - representing monetary values
Electrical calculator and LED displays
Date and time in BIOS of PC
Latitude and longitude
Barcodes (MSI)
Accurately represent decimals and fractions; any use when numbers are electronically coded

Coding of Text →

Text coding needs a character set - the composite number of different symbols computer hardware and software use and recognise. Uses codes, bit patterns, or natural numbers to represent a symbol, and each symbol has a unique number.

ASCII →

7 bits or 128 characters. Extended ascii uses 8 bits or 256 characters
Only supports characters from the English language, so its main downside is that it can't represent other languages
Punctuation, uppercase and lowercase have their own symbols
A key is pressed. Each key is assigned a binary number. the CPU uses the ASCII character set to convert the binary number to a character which is displayed on the screen

Unicode →

16 bits and 2^16 characters
First 128 characters are the same - Unicode is a superset of ASCII
Standardised
Greater range of characters and represents most modern languages, but also means that more storage is required for English

1.2 Multimedia

Coding of Images →

Vectors →

Vectors store a set of instructions and mathematical formulae on how to draw each object. Consists of drawing objects defined in a drawing list. Defined by maths & geometry
Typically used for geometric objects
Can group individual elements
Needs to be rasterised to display and print
Image can be enlarged without becoming pixelated, as it stores instructions to make each image, which are recreated at a larger scale, and therefore can be used on many screen resolutions
Smaller file size, as it stores instructions instead of pixels, so it is faster and uses less bandwidth to upload and download
Contains:
- Drawing List: commands to define each object
- Geometric shapes/lines
- Coordinates
- Commands, formulae and attributes for each object e.g. colour, thickness, width

Bitmaps →

Consists of pixels - picture elements - which are the smallest identifiable component of a bitmap, defined by position and colour. these are arranged in a matrix
Larger file size
Enlarging makes the image appear pixelated
Can be compressed with a significant reduction in file size
Suitable for photos and scanned images
Less processing power
Can’t group individual elements
Colour depth = number of bits for each pixel. n bits = 2^n colours
File size = width × height × colour depth
How they are encoded:
- Each image is split into pixels, which form a grid
- Each pixel is given a binary value and colour
- Sequence of binary numbers stored
- Suitable file format and metadata
- Converts from analogue to digital with an ADC

File header →

A file header is a set of bytes at the beginning to identify a file, confirm no damage and tell OS what to do
Stores metadata about the file
Contains:
- Confirmation of file type
- File size
- Dimensions & resolution
- Colour depth
- Compression

Resolutions →

Image resolution: measure in dots per inch. Detail in an image; total number of pixels in an image, a product of width and height
Screen resolution: monitor specification. number of pixels a screen can display, also a product of width and height

Coding of Sound →

Sound is analogue (continuous range, through measuring a physical property) and must be converted to digital (binary 0s and 1s) using an ADC (analogue to digital converter)

Sampling: recording the amplitude of the analogue sound wave ar regular intervals to approximate the wave. samples are encoded as binary values and stored in the order they appear in
Sampling rate: number of samples per unit time. increasing this increases file size but also accuracy. measured in hertz
Sampling resolution: number of distinct values to encode each sample, or bits per sample. also known as bit depth, and is typically 8,16,24,32 bits. a higher resolution increases accuracy, file size and reduces quantization error and distortion
Band-limiting filter: a component of the sound encoder which removes high frequency components we don’t hear
Quantization: process of correcting timings, background noise etc. so sound is more accurate when sampling

File size = sampling rate × resolution × time

1.3 Compression

Compression →

Compression is needed as data files are very large and would take a long time or a lot of bandwidth to send, and as emails may limit the size of attachments

Lossy →

Some information/data lost and file can not be exactly reconstructed
Some data is deemed redundant and permanently removed
Results in loss of quality
Max compression: 10% of original
Sound: keeps sounds the human ear can process and discards what we can’t. removes background noise and frequencies above human hearing
Images: reduce colour depth so there are less bits per pixel, or reduce resolution so there are less bits overall, e.g. jpeg. difference is unnoticeable to human eye
MP3: psycho-acoustic modelling and perceptual music shaping. certain parts eliminated without significantly degrading listener experience. removes sounds human ear can’t hear and keeps what we can. discards the softer of two sounds

Lossless →

Relies on some form of replacement
Subsequent decoding can exactly recreate the file
Loses none of the original data
Max compression: 50% of original
Run-Length Encoding (RLE): A lossless compression algorithm. Identifies and indexes repeating sequences (runs) and encodes as two values: what is being repeated (run value) and how many times it is repeated (run count). May be preceded by a control character